arXiv:1509.05954vl [q-fin.ST] 20 Sep 2015 


Mean-Reverting Portfolios: 
Tradeoffs Between Sparsity and Volatility 

Marco Cuturi 

Graduate School of Informatics 
Kyoto University 
mcuturiSi.kyoto-u.ac.jp 

Alexandre d’Aspremont 
D.I., UMR CNRS 8548 

Ecole Normale Superieure, aspremon@ens.fr 
September 22, 2015 


Abstract 


Mean-reverting assets are one of the holy grails of financial markets: 
if such assets existed, they would provide trivially profitable investment 
strategies for any investor able to trade them, thanks to the knowledge 
that such assets oscillate predictably around their long term mean. The 
modus operand! of cointegration-based trading strategies [Tsay 2005 §8] 


is to create first a portfolio of assets whose aggregate value mean-reverts, 
to exploit that knowledge by selling short or buying that portfolio when 
its value deviates from its long-term mean. Such portfolios are typically 
selected using tools from cointegration theory [Engle and Granger| |1987| 
whose aim is to detect combinations of assets that are 
stationary, and therefore mean-reverting. We argue in this work that 
focusing on stationarity only may not suffice to ensure profitability of 
cointegration-based strategies. While it might be possible to create syn¬ 
thetically, using a large array of financial assets, a portfolio whose aggre¬ 
gate value is stationary and therefore mean-reverting, trading such a large 
portfolio incurs in practice important trade or borrow costs. Looking for 
stationary portfolios formed by many assets may also result in portfolios 
that have a very small volatility and which require significant leverage to 
be profitable. We study in this work algorithmic approaches that can take 
mitigate these effects by searching for maximally mean-reverting portfo¬ 
lios which are sufficiently sparse and/or volatile. 
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1 Introduction 


Mean-reverting assets, namely assets whose price oscillates predictably around 
a long term mean, provide investors with an ideal investment opportunity. Be¬ 
cause of their tendency to pull back to a given price level, a naive contrarian 
strategy of buying the asset when its price lies below that mean, or selling short 
the asset when it lies above that mean can be profitable. Unsurprisingly, assets 
that exhibit significant mean-reversion are very hard to find in efficient mar¬ 
kets. Whenever mean-reversion is observed in a single asset, it is almost always 
impossible to profit from it: the asset may typically have very low volatility, 
be illiquid, hard to short-sell, or its mean-reversion may occur at a time-scale 
(months, years) for which the borrow-cost of holding or shorting the asset may 
well exceed any profit expected from such a contrarian strategy. 

1.0.1 Synthetic Mean-Reverting Baskets 

Since mean-reverting assets rarely appear in liquid markets, investors have fo¬ 
cused instead on creating synthetic assets that can mimic the properties of a 
single mean-reverting asset, and trading such synthetic assets as if they were 
a single asset. Such a synthetic asset is typically designed by combining long 
and short positions in various liquid assets to form a mean-reverting portfolio, 
whose aggregate value exhibits significant mean-reversion. 

Constructing such synthetic portfolios is, however, challenging. Whereas 
simple descriptive statistics and unit-root test procedures can be used to test 
whether a single asset is mean-reverting, building mean-reverting portfolios re¬ 
quires finding a proper vector of algebraic weights (long and short positions) 
that describes a portfolio which has a mean-reverting aggregate value. In that 
sense, mean-reverting portfolios are made by the investor, and cannot be simply 
chosen among tradable assets. A mean-reverting portfolio is characterized both 
by the pool of assets the investor has selected (starting with the dimension of 
the vector), and by the fixed nominal quantities (or weights) of each of these 
assets in the portfolio, which the investor also needs to set. When only two as¬ 
sets are considered, such baskets are usually known as long-short trading pairs. 
We consider in this paper baskets that are constituted by more than two assets. 


1.0.2 Mean-Reverting Baskets with Sufficient Volatility and Spar¬ 
sity 

A mean-reverting portfolio must exhibit sufficient mean-reversion to ensure that 
a contrarian strategy is profitable. To meet this requirement, investors have 
relied on cointegration theory [Engle and Granger 1987, Maddala and Kim 


1998 Johansenj 2005 to estimate linear combinations of assets which exhibit 


stationarity (and therefore mean-reversion) using historica 

data. We argue 

in this work, as we did in earlier references d’Aspremont 

2011 

Cuturi and 

d’AspremontM2013|, that mean-reverting strategies cannot. 

tiowever, only rely 


on this approach to be profitable. Arbitrage opportunities can only exist if they 
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are large enough to be traded without using too much leverage or incurring too 
many transaction costs. For mean-reverting baskets, this condition translates 
naturally into a first requirement that the gap between the basket valuation 
and its long term mean is large enough on average, namely that the basket 
price has sufficient variance or volatility. A second desirable property is that 
mean-reverting portfolios require trading as few assets as possible to minimize 
costs, namely that the weights vector of that portfolio is sparse. We propose in 
this work methods that maximize a proxy for mean reversion, and which can 
take into account at the same time constraints on variance and sparsity. 

We propose first in Section three proxies for mean reversion. Section [^defines 
the basket optimization problems corresponding to these quantities. We show 
in Section that each of these problems translate naturally into semidefinite 
relaxations which produce either exact or approximate solutions using sparse 
PCA techniques. Finally, we present numerical evidence in Section that tak¬ 
ing into account sparsity and volatility can significantly boost the performance 
of mean-reverting trading strategies in trading environments where trading costs 
are not negligible. 


2 Proxies for Mean-Reversion 


Isolating stable linear combinations of variables of multivariate time series is a 
fundamental problem in econometrics. A classical formulation of the problem 
reads as follows: given a vector valued process x = {xt)t taking values in K” and 
indexed by time t G IN, and making no assumptions on the stationarity of each 
individual component of x, can we estimate one or many directions y S K” such 
that the univariate process {y^Xt) is stationary? When such a vector y exists, 
the process x is said to be cointegrated. The goal of cointegration techniques is 
to detect and estimate such directions y. Taken for granted that such techniques 
can efficiently isolate sparse mean reverting baskets, their financial application 
can be either straightforward using simple event triggers to buy, sell or simply 
hold the basket Tsay 2005 §8.6], or more elaborate optimal trading strategies 


if one assumes that the mean-reverting basket value is a Ohrstein-Ullenbeck 


Elie and Espinosa 2011 


process, as discussed in Jurek and Yang 2007 Liu and Timmermann 2010 


2.1 Related Work and Problem Setting 

Engle and Granger [1987 provided in their seminal work a first approach to 
compare two non-stationary univariate time series (x*, yt), and test for the exis¬ 
tence of a term a such that yt — axt becomes stationary. Following this seminal 
work, several techniques have been proposed to generalize that idea to multi- 

§5], 


variate time series. As detailed in the survey by Maddala and Kim 


cointegration techniques differ in the modeling assumptions they require on the 
time series themselves. Some are designed to identify only one cointegrated re- 
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lationship, whereas others are designed to detect many or all of them. Among 


these references, Johansen 1991 proposed a popular approach that builds upon 
a VAR model, as surveyed in Johansen, 2005 2004 . These approaches all dis¬ 


cuss issues that are relevant to econometrics, such as de-trending and seasonal 
adjustments. Some of them focus more specifically on testing procedures de¬ 
signed to check whether such cointegrated relationships exist or not, rather than 
on the robustness of the estimation of that relationship itself. We follow in this 
work a simpler approach proposed by d’Aspremont 2011 , which is to trade-off 


interpretability, testing and modeling assumptions for a simpler optimization 
framework which can be tailored to include other aspects than only stationar- 


ity. 

d’Aspremont 

2011 did so by adding regularizers to the predictability crite- 

rion proposed by 

Box and Tiao 

1977 . 

We follow in this paper the approach we 

proposed in Guturi and d’Aspremont 

2013 to design mean-reversion proxies 


that do not rely on any modeling assumption. 

Throughout this paper, we write S„ for the n x n cone of positive definite 
matrices. We consider in the following a multivariate stochastic process x = 
(a;t)tg]N taking values in K”. We write Ak = k > 0 for the lag-fc 

autocovariance matrix of Xt if it is finite. Using a sample path x of {xt), where 
X = (xi,... ,xt) and each xt € M", we write Ak for the empirical counterpart 
of Ak computed from x, 


Ak = 


1 


T-k 


T-k-l^ 

t=l 


~ def 1 

Xt = Xt - - ^ 


T 


Xt. 


( 1 ) 


Given y G M", we now define three measures which can all be interpreted as 
proxies for the mean reversion of y'^Xt- Predictability - defined for stationary 


processes by 

Box and Tiao 

1977 and generalized for non-stationary processes 

by 

Bewley et al. 1994 - measures 

how close to noise the series is. The port- 

manteau statistic 

Ljung and Box 

1978 is used to test whether a process is 


white noise. Finally, the crossing statistic Ylvisaker 1965 measures the 


probability that a process crosses its mean per unit of time. In all three cases, 
low values for these criteria imply a fast mean-reversion. 


2.2 Predictability 


We briefly recall the canonical decomposition derived in Box and Tiao 
Suppose that xt follows the recursion: 

Xt = xt-i +et, 


1977 


( 2 ) 


where Xt-i is a predictor of Xt built upon past values of the process recorded up 
to t — 1, and St is a vector of i.i.d. Gaussian noise with zero mean and covariance 
S G S„ independent of all variables {xr)r<t- The canonical analysis in Box and 
Tiao 1977 starts as follows. 
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2.2.1 Univariate case 


Suppose n = 1 and thus S G IR+, Equation ([^ leads thus to 

thus 1 = ^ + 

by introducing the variances and of Xt and Xt respectively, 
measure the predictability of Xt by the ratio 

A 2 • 

(7^ 

The intuition behind this variance ratio is simple: when it is small the variance 
of the noise dominates that of Xt-i and Xt is dominated by the noise term; 
when it is large, Xt-i dominates the noise and Xt can be accurately predicted 
on average. 


Box and Tiao 


2.2.2 Multivariate case 


Suppose n > 1 and consider now the univariate process {y'^Xt)t with weights 
y G K". Using ([^ we know that y^Xt = y'^Xt-i + and we can measure its 
predicability as 


Kv) 


de^f y'^Apy 
y'^Aoy’ 


(3) 


where Ao and Aq are the covariance matrices of Xt and Xt-i respectively. Mini¬ 
mizing predictability A(y) is then equivalent to finding the minimum generalized 
eigenvalue A solving 


det(AAlo — Mo) = 0. 


(4) 


Assuming that Mo is positive definite, the basket with minimum predictability 
will be given by y = Mo ' yo, where yo is the eigenvector corresponding to the 
smallest eigenvalue of the matrix Mo ^^^MoM(7^^^. 


2.2.3 Estimation of A(y) 


All of the quantities used to define A above need to be estimated from sample 
paths. Mo can be estimated by Mq following Equation Q. All other quantities 
depend on the predictor Xt-i- Box and Tiao assume that Xt follows a vector 


autoregressive model of order p - VAR(p) in short - and therefore Xt-i takes 
the form. 


Xt-l 


p 

E 

k=l 


IdkXt — k) 


where the p matrices {T-Lk) contain each n x n autoregressive coefficients. Esti¬ 
mating Tik from the sample path x, |Box and Tiao| solve for the optimal basket 
by inserting these estimates in the generalized eigenvalue problem displayed 
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in Equation Q. If one assumes that p = 1 (the case p > 1 can be trivially 
reformulated as a VAR(l) model with adequate reparameterization), then 

A-q = 'HiAq'H'i and Ai = AqHi, 


and thus the Yule-Walker estimator Liitkepohl 2005 §3.3] of Hi would be 


Hi — Aq '^Ai. Minimizing predictability boils down to solving in that case 


minA(p), A(p) ='" — 
y 


(ffiAogf) y 
y'^Aoy 


y^ (AiAq ^AJ) y 
y'^Aoy 


which is equivalent to computing the smallest eigenvector of the matrix A^ AiA'^^ Aq 

if the covariance matrix Aq is invertible. 

The machinery of |Box and Tiaoj to quantify mean-reversion requires defining 
a model to form Xt-i, the conditional expectation of Xt given previous observa¬ 
tions. We consider in the following two criteria that do without such modeling 
assumptions. 


2.3 Portmanteau Criterion 


Recall that the portmanteau statistic of order p Ljung and Box 1978 of a 


centered univariate stationary process x (with n = 1) is given by 


porp(a;) = 


1 /E[a;ta:t+i] A 
p^lv E[x?] ) 


where 'Ei[xtXt+i]/'E\x‘l] is the ith order autocorrelation of Xt- The portmanteau 
statistic of a white noise process is by definition 0 for any p. Given a multivariate 
(n > 1) process x we write 


Mv) = Porp{y'^x) = 



for a coefficient vector y G K”. By construction, (j)p{y) = <^p(tp) for any t ^ 
0 and in what follows, we will impose ||p ||2 = 1- The quantities 4>p{y) are 
computed using the following estimates [Hamilton 1994, p.llO]: 


^p{y) = 



(5) 


2.4 Crossing Statistics 

§4.1] define the zero crossing rate of a univariate 
(n = 1) process x (its expected number of crosses around 0 per unit of time) as 


Kedem and Yakowitz [1994 


7 (a;) = E 



T- 1 


( 6 ) 
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A result known as the cosine formula states that if Xt is an autoregressive process 
of order one AR(1), namely if |a| < 1, £t is i.i.d. standard Gaussian noise and 
Xt = axt-i +£t, then Kedem and Yakowitz[ |199^ §4.2.2]: 


7(x) = 


arccos(a) 


Hence, for AR(1) processes, minimizing the first order autocorrelation a also 
directly maximizes the crossing rate of the process x. For n > 1, since the first 
order autocorrelation of y^Xt is equal to y'^Aiy, we propose to minimize y^Aiy 
and ensure that all other absolute autocorrelations \y'^ Aky\, k> 1 are small. 


3 Optimal Baskets 

Given a centered multivariate process x, we form its covariance matrix Aq and 
its p autocovariances (Ai,..., Ap). Because y^’^Ay = j/^(A + A'^)yl2, we sym¬ 
metrize all autocovariance matrices At. We investigate in this section the prob¬ 
lem of estimating baskets that have maximal mean reversion (as measured by 
the proxies proposed in Sectiorj^, while being at the same time sufficiently 
volatile and supported by as few assets as possible. The latter will be achieved 
by selecting portfolios y that have a small “0-norm”, namely that the number 
of non-zero components in ?/, 


II2/II0 = #{1 < i < d\yi ^ 0 }, 

is small. The former will be achieved by selecting portfolios whose aggregated 
value exhibits a variance over time that exceeds a given threshold v > Q. Note 
that for the variance of {y^Xt) to exceed a level v, the largest eigenvalue of 
Aq must necessarily be larger than v, which we always assume in what follows. 
Combining these two constraints, we propose three different mathematical pro¬ 
grams that reflect these trade-offs. 


3.1 Minimizing Predictability 


Minimizing Box-Tiao’s predictability A defined in { 2.2 while ensuring that both 
the variance of the resulting process exceeds v and that the vector of loadings 
is sparse with a 0-norm equal to fc, means solving the following program: 


minimize y"^AIy 
subject to y'^AQy > ly, 
\\y \\2 = i, 
II2/II0 = k, 


(PI) 


. def 


in the variable y S M" with M = AiAg A( , where M,Ao G S„ 
normalization constraint ||y ||2 = 1 and the sparsity constraint ||; 


Without the 
llo = k, prob¬ 


lem (PI) is equivalent to a generalized eigenvalue problem in the pair {M,Aq). 
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That problem quickly becomes unstable when is ill-conditioned or M is sin¬ 
gular. Adding the normalization constraint \\y \\2 = 1 solves these numerical 
problems. 


3.2 Minimizing the Portmanteau Statistic 


Using a similar formulation, we can also minimize the order p portmanteau 
statistic defined in §2.3| while ensuring a minimal variance level v by solving: 


minimize J2i=i 
subject to y^Ao-y > v, 
\\vh = i, 
llyllo = k, 


(P2) 


in the variable y € K”, for some parameter iz > 0. Problem (P2) has a nat¬ 


ural interpretation: the objective function directly minimizes the portmanteau 
statistic, while the constraints normalize the norm of the basket weights to one, 
impose a variance larger than iz and impose a sparsity constraint on y. 


3.3 Minimizing the Crossing Statistic 


Following the results in §2.4[ maximizing the crossing rate while keeping the 
rest of the autocorrelogram low. 


minimize y'^Aiy + fj, J2k=2 (y'^^ky) ^ 
subject to y^Aoy > v, 

\\yh = h 

lll/llo = k, 


(P3) 


in the variable y S K", for some parameters /i, zz > 0, will produce processes 
that are close to being AR(1), while having a high crossing rate. 


Semidefinite Relaxations and Sparse Compo¬ 
nents 


Problems (PI I, (P2) and (P3| are not convex, and can be in practice extremely 


difficult to solve, since they involve a sparse selection of variables. We detail in 
this section convex relaxations to these problems which can be used to derive 
relevant sub-optimal solutions. 


4.1 


A Semidefinite Programming Approach to Basket Es¬ 
timation 


We propose to relax problems (PI), (P2) and (P3| into Semidefinite Programs 
(SDP) [Vandenberghe and Boyd 1996 . We show that these semidefinite pro¬ 
grams can handle naturally sparsity and volatility constraints while still aiming 



















at mean-reversion. In some restricted cases, one can show that these relaxations 
are tight, in the sense that they solve exactly the programs described above. In 
such cases, the true solution y* of some of the programs above can be recovered 
using their corresponding SDP solution Y*. 

However, in most of the cases we will be interested in, such a correspondence 
is not guaranteed and these SDP relaxations can only serve as a guide to propose 
solutions to these hard non-convex problems when considered with respect to 
vector y. To do so, the optimal solution Y* needs to be deflated from a large 
rank d x d matrix to a rank one matrix where y can be considered a good 
candidate for basket weights. A typical approach to deflate a positive definite 
matrix into a vector is to consider its eigenvector with the leading eigenvalue. 
Having sparsity constraints in mind, we propose to apply a heuristic grounded on 
sparse-PCA Zou et al. 2006 d’Aspremont et al. 2007 . Instead of considering 


the lead eigenvector, we recover the leading sparse eigenvector of Y* (with a 
0-norm constrained to be equal to k). Several efficient algorithmic approaches 
have been proposed to solve approximately that problem; we use the SPASM 


toolbox Sjostrand et al., 2012 in our experiments. 


4.2 Predictability 


We can form a convex relaxation of the predictability optimization problem (PI I 
over the variable y G K", 


minimize y"^ My 

subject to y^’^Aoy > v 
\\yh = i. 
Ilyllo = k, 


by using the lifting argument of Lovasz and Schrijver 1991 , i.e. writing 
Y = to solve now the problem using a semidefinite variable P, and by 

introducing a sparsity-inducing regularizer on Y which considers the Li norm 
of y, 

ij 


so that Problem (PI) becomes (here p > 0), 


minimize Tr(My)-|-p||y|| i 
subject to Tr(Aoy) > ly 

Tr(y) = 1, Rank(y) = 1, P ^ 0. 


We relax this last problem further by dropping the rank constraint, to get 

(SDPl) 

^ y 0 

which is a convex semidefinite program in P £ S„. 


minimize Tr(MP) -|- p||P||i 
subject to Tr(AoP) > v 

Tr(P) = 1, P P 0 
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4.3 Portmanteau 


Using the same lifting argument and writing V = we can relax prob¬ 

lem (P2) by solving 


minimize + p\\Y\\i 

subject to Tr(BY) > v (SDP2) 

Tr(F) = 1, F h 0, 


a semidefinite program in F € S„. 


4.4 Crossing Stats 


As above, we can write a semidefinite relaxation for problem (P3): 

minimize Tr(AiF) + M+Pll^lli 
subject to Tr(i3F) > v 

Tr(F) = 1, F F 0 


(SDP3) 


4.4.1 Tightness of the SDP Relaxation in the Absence of Sparsity 
Constraints 

Note that for the crossing stats criterion (with p = 1 and no quadratic term in F) 


criteria, the original problem P3 and its relaxation SDP3 are equivalent, taken 
for granted that no sparsity constraint is considered in the original problems 
and fjL set to 0 in the relaxations. This relaxations boil down to an SDP’s that 
only has a linear objective, a linear constraint and a constraint on the trace of 


F. In that case, Brickman 1961 showed that the range of two quadratic forms 


over the unit sphere is a convex set when the ambient dimension n > 3, which 
means in particular that for any two square matrices A, B of dimension n 


{(2/’^4y, y^Ry) : y G W 
{(Tr(AF),Tr(RF)):FeS, 


Il2/||2 = l} = 

TrF = 1, F F 0} 


We refer the reader to [Barvinoi^ |2002[ §11.13] for a more complete discussion 
of this result. As remarked in Cuturi and d’Aspremont 2013 , the same equiv¬ 


alence holds for ED and |SDPT| This means that, in the case where p, p = 0 and 
the 0-norm of y is not constrained, for any solution F* of the relaxation ( |SDP1| ) 
there exists a vector y* which satisfies ||y ||2 = Tr(F*) = 1, y^’^Agy* = Tr(i?F*) 
and y*^My* = Tr(MF*) which means that y* is an optimal solution of the 
original problem ( |P1[ ). Boyd and Vandenberghe [2004 App. B] show how to 


ex¬ 


plicitly extract such a solution y* from a matrix F* solving (SDPl). This result 
is however mostly anecdotical in the context of this paper, in which we look for 
sparse and volatile baskets: using these two regularizers breaks the tightness 
result between the original problems in and their SDP counterparts. 
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Apple - AAPL Volatility Time Series 



Figure 1: Option implied volatility for Apple between January 4 2004 and 
December 30 2010. 


5 Numerical Experiments 

In this section, we evaluate the ability of our techniques to extract mean- 
reverting baskets with sufficient variance and small 0-norm from a universe of 
tradable assets. We measure performance by applying to these baskets a trading 
strategy designed specihcally for mean-reverting processes. We show that, under 
realistic trading costs assumptions, selecting sparse and volatile mean-reverting 
baskets translates into lower incurred costs and thus improves the performance 
of trading strategies. 

5.1 Historical Data 

We consider daily time series of option implied volatilities for 210 stocks from 
January 4 2004 to December 30 2010. A key advantage of using option implied 
volatility data is that these numbers vary in a somewhat limited range. Volatil¬ 
ity also tends to exhibit regime switching, hence can be considered piecewise 
stationary, which helps in extracting structural relationships. We plot a sample 
time series from this dataset in Figure that corresponds to the implicit volatil¬ 
ity of Apple’s stock. In what follows, we mean by asset the implied volatility 
of any of these stocks, whose value can be efficiently replicated using option 
portfolios. 


5.2 Mean-reverting Basket Estimators 


We compare the three basket selection techniques detailed here, predictability, 
portmanteau and crossing statistic, implemented with varying targets for 
both sparsity and volatility, with two cointegration estimators that build upon 
principal component analysis [Maddala and Kiin 1998, §5.5.4]. By the label 
‘PCA’ we mean in what follows the eigenvector with smallest eigenvalue of the 
covariance matrix Aq of the process Stock and Watson 1988] . By ‘sPCA’ 
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we mean the sparse eigenvector of Aq with 0-norm k that has the smallest 
eigenvalue, which can be simply estimated by computing the leading sparse 
eigenvector of XI — Aq where A is bigger than the leading eigenvalue of Aq. This 
sparse principal component of the covariance matrix Aq should not be confused 
with our utilization of sparse PCA in Section |4.1| as a way to recover a vector 
solution from the solution of a positive semidefinite problem. Note also that 
techniques based on principal components do not take explicitly variance levels 
into account when estimating the weights of a co-integrated relationship. 


5.3 Jurek and Yang Trading Strategy 


While option implied volatility is not directly tradable, it can be synthesized 
using baskets of call options, and we assimilate it to a tradable asset with 
(significant) transaction costs in what follows. For baskets of volatilities isolated 
by the techniques listed above, we apply the Jurek and Yang 2007| strategy for 


log utilities to the basket process recording out of sample performance. Jurek 


and Yang propose to trade a stationary autoregressive process ixt)t of order 1 


and mean /r governed by the equation Xt+i = pxt+cret, where \p\ < 1, by taking 
a position Nt in the asset Xt which is proportional to 


JV, = (7) 

In effect, the strategy advocates taking a long (resp. short) position in the 
asset whenever it is below (resp. above) its long-term mean, and adjust the 
position size to account for the volatility of Xt and its mean reversion speed p. 
Given basket weights y, we apply standard AR estimation procedures on the 
in-sample portion of j/^x to recover estimates for p and a and plug them directly 
in Equation 0. This approach is illustrated for two baskets in Figure 


5.4 Transaction Costs 

We assume that fixed transaction costs are negligible, but that transaction costs 
per contract unit are incurred at each trading date. We vary the size of these 
costs across experiments to show the robustness of the approaches tested here 
to trading costs fluctuations. We let the transaction cost per contract unit vary 
between 0.03 and 0.17 cents by increments of 0.02 cents. Since the average 
value of a contract over our dataset is about 40 cents, this is akin to considering 
trading costs ranging from about 7 to about 40 Base Points (BP), that is 0.07 
to 0.4%. 


5.5 Experimental Setup 

We consider 20 sliding windows of one year (255 trading days) taken in the 
history, and consider each of these windows independently. Each window is 
split between 85% of days to estimate and 15% of days to test-trade our models, 
resulting in 38 test-trading days. We do not recompute the weights of the baskets 
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Figure 2: Three sample trading experiments, using the PCA, sparse 
PCA and the Crossing Statistics estimators, [a] Pool of 9 volatility time- 
series selected using our fast PCA selection procedure, [b] Basket weights esti¬ 
mated with in-sample data using either the eigenvector of the covariance matrix 
with smallest eigenvalue, the smallest eigenvector with a sparsity constraint of 
k = [0.5 X 9J = 4 and the Crossing Statistics estimator with a volatility thresh¬ 
old of i7 = 0.2, ie.a constraint on the basket’s variance to be larger than 0.2x 
the median variance of all 8 assets, [c] Using these 3 procedures, the time series 
of the resulting basket price in the in-sample part [c] and out-sample parts [d] 
are displayed, [e] Using the Jurek and Yang [2007 trading strategy results in 
varying positions (expressed as units of baskets) during the out-sample testing 
phase, [f] Transaction costs that result from trading the assets to achieve such 
positions accumulate over time, [g] Taking both trading gains and transaction 
costs into account, the net wealth of the investor for each strategy can be com¬ 
puted (the Sharpe over the test period is displayed in the legend). Note how 
both sparsity and volatility constraints ^anslate into portfolios composed of less 
assets, but with a higher variance. 






































during the test phase. The 210 stock volatilities (assets) we consider are grouped 
into 13 subgroups, depending on the economic sector of their stock. This results 
in 13 sector pools whose size varies between 3 assets and 43 assets. We look for 
mean-reverting baskets in each of these 13 sector pools. 

Because all combinations of stocks in each of the 13 sector pools may not nec¬ 
essarily mean-reverting, we select smaller candidate pools of n assets through a 
greedy backward-forward minimization scheme, where 8 < n < 12. To do so, we 
start with an exhaustive search of all pools of size 3 within the sector pool, and 
proceed by adding or removing an asset using the PCA estimator (the smallest 
eigenvalue of the covariance matrix of a set of assets). We use the PCA estimator 
in that backward-forward search because it is the fastest to compute. We score 
each pool using that PCA statistic, the smaller meaning the better. We generate 
up to 200 candidate pools per each of the 13 sector pools. Out of all these candi¬ 
date pools, we keep the best 50 in each window, and use then our cointegration 
estimation approaches separately on these candidates. One such pool was, for 
instance, composed of the stocks {BBY,COST,DIS,GCI,MCD,VDD,VZ,WAG,T} ob¬ 
served during the year 2006. Figure provides a closeup on that universe of 
stocks, and shows the results of three trading experiments using either PCA, 
sparse PCA or the Crossing Stats estimator to build trading strategies. 

5.6 Results 

5.6.1 Robustness of Sharpe Ratios to Costs 

In Figure]^ we plot the average of the Sharpe ratio over the 922 baskets esti¬ 
mated in our experimental set versus transaction costs. We consider different 
PCA settings as well as our three estimators using, in all 3 cases, the variance 
bound v to be 0.3 times the median of all variances of assets available in a 
given asset pool, and the 0-norm to be equal to 0.3 times the size of the uni¬ 
verse (itself between 8 and 12). We observe that Sharpe ratios decrease the 
fastest for the naive PCA based method, this decrease being somewhat miti¬ 
gated when adding a constraint on the 0-norm of the basket weights obtained 
with sparse PCA. Our methods require, in addition to sparsity, enough volatily 
to secure sufficient gains. These empirical observations agree with the intuition 
of this paper: simple cointegration techniques can produce synthetic baskets 
with high mean-reversion, large support, low variance. Trading a portfolio with 
low variance which is supported by multiple assets translates in practice into 
high trading costs which can damage the overall performance of the strategy. 
Both sparse PCA and our techniques manage instead to achieve a trade-off be¬ 
tween desirable mean-reversion properties and, at the same time, control for 
sufficient variance and small basket size to allow for lower overall transaction 
costs. 

5.6.2 Tradeoffs Between Mean Reversion, Sparsity, and Volatility 

In the plots of Figure and this analysis is further detailed by considering 
various settings for v (volatility threshold) and k. To improve the legibility 
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of these results we summarize, following the observation in Figure that the 
relationship between Sharpes and transactions costs seems almost linear, each 
of these curves by two numbers: an intercept level (Sharpe ratio when costs 
are low) and a slope (degradtion of Sharpe as costs increase). Using these 
two numbers, we locate all considered strategies in the intercept/slope plane. 
We first show the spectral techniques, PCA and sPCA with different levels of 
sparsity, meaning that k is set to [u x d\ where u G {0.3, 0.5,0.7} and d is the 
size of the original basket. Each of the three estimators we propose is studied 
in a separate plot. For each we present various results characterized by two 
numbers: a volatility threshold v G {0, 0.1,0.2, 0.3,0.4, 0.5} and a sparsity level 
u G {0.3,0.5, 0.7}. To avoid cumbersome labels, we attach an arrow to each 
point: the arrow’s length in the vertical direction is equal to u and characterizes 
the size of the basket, the horizontal length is equal to i' and characterizes the 
volatility level. As can be seen in these 3 plots, an interesting interplay between 
these two factors allows for a continuum of strategies that trade mean-reversion 
(and thus Sharpe levels) for robustness to cost level. 

6 Conclusion 

We have described three different criteria to quantify the amount of mean re¬ 
version in a time series. For each of these criteria, we have detailed a tractable 
algorithm to isolate a vector of weights that has optimal mean reversion, while 
constraining both the variance (or signal strength) of the resulting univariate 
series to be above a certain level and its 0-norm to be at a certain level. We 
show that these bounds on variance and support size, together with our new 
criteria for mean reversion, can significantly improve the performance of mean 
reversion statistical arbitrage strategies and provide useful controls to adjust 
mean-reverting strategies to varying trading conditions, notably liquidity risk 
and cost environment. 
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Average Sharpe Ratio 



Figure 3: Average Sharpe ratio for the Jurek and Yang 2007 trading strat¬ 


egy captured over about 922 trading episodes, using different basket estimation 
approaches. These 922 trading episodes were obtained by considering 7 dis¬ 
joint time-windows in our market sample, each of a length of about one year. 
Each time-window was divided into 85% in-sample data to estimate baskets, 
and 15% outsample to test strategies. On each time-window , the set of 210 
tradable assets during that period was clustered using sectorial information, and 
each cluster screened (in the in-sample part of the time-window) to look for the 
most promising baskets of size between 8 and 12 in terms of mean reversion, by 
choosing greedily subsets of stocks that exhibited the smallest minimal eigenval¬ 
ues in their covariance matrices. For each trading episode, the same universe of 
stocks was fed to different mean-reversion algorithms. Because volatility time- 
series are bounded and quite stationary, we consider the PCA approach, which 
uses the eigenvector with the smallest eigenvalue of the covariance matrix of 
the time-series to define a cointegrated relationship. Besides standard PCA, 
we have also consider sparse PCA eigenvectors with minimal eigenvalue, with 
the size k of the support of the eigenvector (the size of the resulting basket) 
constrained to be 30%, 50% or 70% of the total number of considered assets. 
We consider also the portmanteau, predictability and crossing stats estimation 
techniques with variance thresholds of zz = 0.2 and a support whose size k (the 
number of assets effectively traded) is targeted to be about 30% of the size of 
the considered universe (itself between 8 and 12). As can be seen in the figure, 
the sharpe ratios of all trading approaches decrease with an increase in transac¬ 
tion costs. One expects sparse baskets to perform better under the assumption 
that costs are high, and this is indeed observed here. Because the relationship 
between sharpe ratios and transaction costs can be efficiently summarized as 
being a linear one, we propose in the plots displayed in Figures]^ and [^a way 
to summarize the lines above with two numbers each: their intercept (Sharpe 
level in the quasi-absence of costs) and slope (degradation of Sharpe as costs 
increase). This visualization is useful tjgobserve how sparsity (basket size) and 
volatility thresholds influence the robustness to costs of the strategies we pro¬ 
pose. This visualization allows us to observe how performance is influenced by 
these parameter settings. 
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Figure 4: Relationships between Sharpe in a low cost setting (intercept) in the 
at-axis and robustness of Sharpe to costs (slope of Sharpe/costs curve) of a dif¬ 
ferent estimators implemented with varying volatility levels v and sparsity levels 
k parameterized as a multiple of the universe size. Each colored square in the 
figures above corresponds to the performance of a given estimator (Portmanteau 
in subfigure (a), Predictability in subfigure (6)) using different parameters for 
iz G {0, 0.1,0.2, 0.3,0.4, 0.5} and u G {0.3, 0.5,0.7}. The parameters used for 
each experiment are displayed using an arrow whose vertical length is propor¬ 
tional to V and horizontal length is proportional to u. 
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Figure 5: Same setting as Figure]^ using the crossing statistics (c). 
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